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Abstract: The persistent homology with coefficients in a field F coincides with the same for 
cohomology because of duality. We propose an implementation of a recently introduced algorithm 
for persistent cohomology that attaches annotation vectors with the simplices. We separate the 
representation of the simplicial complex from the representation of the cohomology groups, and 
introduce a new data structure for maintaining the annotation matrix, which is more compact 
and reduces substantially the amount of matrix operations. In addition, we propose heuristics to 
simplify further the representation of the cohomology groups and improve both time and space 
complexities. The paper provides a theoretical analysis, as well as a detailed experimental study 
of our implementation and comparison with state-of-the-art software for persistent homology and 
cohomology. 
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Une Structure de Donnees Efficace pour le Calcul de 
Cohomologie Persistente 

Resume : Le calcul d'homologie et de cohomologie persistentes dans un corps K coincident 
par dualite. Nous proposons une implementation de l'algoritlime de cohomologie persistente, 
qui associe un vecteur d'annotation a chaque simplexe du complexe simplicial. Nous separons 
la representation du complexe de celle des groupes de cohomologie, et nous introduisons une 
nouvelle structure de donnees pour representer les annotations. Cette structure est plus compacte 
et permet de reduire le nombre d'operations dans la matrice. Nous introduisons egalement des 
heuristiques pour reordonner les simplexes, ceci pour simplifier les groupes de cohomologie et 
ameliorer les performances en temps et espace de l'implementation. Cet article fournit une analyse 
de complexite ainsi qu'une etude experimentale detaillee de 1'implementation. Nous comparons 
notamment notrc implementation avec les librairies disponibles de calculs d'homologie et de 
cohomologie persistentes. 
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1 Introduction 

Persistent homology |H] is an algebraic method for measuring the topological features of a space 
induced by the sublevel sets of a function. Its generality and stability with regard to noise 
have made it a widely used tool for the study of data, where it does not need any knowledge 
a priori. A common approach is the study of the topological invariants of a nested family of 
simplicial complexes built on top of the data, seen as a set of points in a geometric space. This 
approach has been successfully used in various areas of science and engineering, as for example 
in sensor networks, image analysis, and data analysis where one typically needs to deal with big 
data sets in high dimensions. Consequently, the demand for designing efficient algorithms and 
implementation to compute the persistent homology of filtered simplicial complexes has grown. 

The first persistence algorithm [10113] can be implemented by reducing a matrix defined by 
face incidence relations, through column operations. The running time is 0(m 3 ) where m is 
the number of simplices of the simplicial complex and, despite good performance in practice, 
Morozov proved that this bound is tight [T2] . Recent optimizations taking advantage of the 
special structure of the matrix to be reduced have led to significant progress in the theoretical 
analysis |11I4| as well as in practice |4llj . 

A different approach |7l6j interprets the persistent homology groups in terms of their dual, 
the persistent cohomology groups. The cohomology algorithm has been reported to work better 
in practice than the standard homology algorithm [6] but this advantage seems to fade away 
when compared to the recent optimized homology algorithms [T]. An elegant description of the 
cohomology algorithm, using the notion of annotations, has been introduced in [5] and used to 
design more general algorithms for maintaining cohomology groups under simplicial maps. 

In this work, we propose an implementation of the annotation-based algorithm for computing 
persistent cohomology. A key feature of our implementation is a distinct separation between the 
representation of the simplicial complex and the representation of the cohomology groups. In 
our implementation, the simplicial complex can be represented either by its Hasse diagram or 
by using the more compact simplex tree [2|. The cohomology groups are stored in a new data 
structure that represents a Compressed version of the Annotation Matrix. As a consequence, the 
time and space complexities of our algorithm depend mostly on properties of the cohomology 
groups we maintain along the computation and only linearly on the size of the simplicial complex. 

Moreover, maintaining the simplicial complex and the cohomology groups separately allows 
us to reorder the simplices while keeping the same persistent cohomology. This significantly re- 
duces the size of the cohomology groups to be maintained, and improves considerably both the 
time and memory performance as shown by our detailed experimental analysis on a variety of ex- 
amples. Our method compares favourably with state-of-the-art software for computing persistent 
homology and cohomology. 

Background: A simplicial complex is a pair K, = (V, S) where V is a finite set whose elements 
are called the vertices of K and S is a set of non-empty subsets of V that is required to satisfy 
the following two conditions : 1. p € V =► {p} G S and 2. a E S, t C cr => r G S". Each element 
a E S is called a simplex or a face of K and, if a G S has precisely s + 1 elements (s > —1), a 
is called an s-simplex and its dimension is s. The dimension of the simplicial complex K, is the 
largest k such that S contains a fc-simplex. We define KP to be the set of p-dimensional simplices 
of fC, and note its size \K P \. Given two simplices r and a in /C, r is a subface (resp. coface) of 
a if r C a (resp. t3it). The boundary of a simplex a, denoted da, is the set its subfaces with 
codimcnsion 1. 

A filtration |9_ of a simplicial complex is an order relation on its simplices which respects 
inclusion. Consider a simplicial complex K. — (V, S) and a function p : S — > R. We require p to 
be monotonic in the sense that, for any two simplices r C a in /C, p satisfies p(r) < p{a). We 
will call p(a) the filtration value of the simplex a. Monotonicity implies that the sublevel sets 
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/C(r) = p~ l {— oo, r] are subcomplexes of /C, for every r £ R. Let m be the number of simpliccs 
of /C, and let (pi)i=i... n be the n different values p takes on the simplices of /C. Plainly n < m, 
and we have the following sequence of n + 1 subcomplexes: 

= ICq C ■ ■ • C K n — K, -oo = po < ••• < Pn, fci = /9 _1 (-oo,/9,] 

Applying a (co)homology functor to this sequence of simplicial complexes turns (combina- 
torial) complexes into (algebraic) abelian groups and inclusion into group homomorphisms. 
Roughly speaking, a simplicial complex defines a domain as an arrangement of local bricks 
and (co)homology catches the global features of this domain, like the connected components, 
the tunnels, the cavities, etc. The homomorphisms catch the evolution of these global features 
when inserting the simplices in the order of the filtration. Let H p {K.) and H P (1C) denote respec- 
tively the homology and cohomology groups of K, of dimension p with coefficients in a field F. 
The filtration induces a sequence of homomorphisms in the homology and cohomology groups in 
opposite directions: 



= H P (K ) -> ff p (£i) -> ■ 


■ • -> ff p (/C»_i) -> H p {K n ) = H P (JC) 


(1) 


O = ff p (/C )^ff p (/Ci) <-•• 


■ <- HP(/C n ^) <- ff p (/C„) = ff p (/C) 


(2) 



We refer to [9] for an introduction to the theory of (co)homology and persistent homology. 
Computing the persistent homology of such a sequence consists in pairing each simplex that 
creates a homology feature with the one that destroys it. The usual output is a persistence 
diagram, which is a plot of the points (p(r), p(cr)) for each persistent pair (r, a). It is known that 
because of duality the two sequences above provide the same persistence diagram [7J. 

The original persistence algorithm [TU] considers the homology sequence in Equation fl] that 
aligns with the filtration direction. It detects when a new homology class is born and when an 
existing class dies as we proceed forward through the filtration. Recently, a few algorithms have 
considered the cohomology sequence in Equation [2] which runs in the opposite direction of the 
filtration [768. The birth of a cohomology class coincides with the death of a homology class 
and the death of a cohomology class coincides with the birth of a homology class. Therefore, 
by tracking a cohomology basis along the filtration direction and switching the notions of births 
and deaths, one can obtain all information about the persistent homology. The algorithm of de 
Silva et al. [7 computes the persistent cohomology following this principle which is reported to 
work better in practice than the original persistence algorithm [6]. Recently, Dey et al. |8] recog- 
nized that tracking cohomology bases provides a simple and natural extension of the persistence 
algorithm for filtrations connected with simplicial maps. Their algorithm is based on the notion 
of annotation and, when restricted to only inclusions, is a re-formulation of the algorithm of de 
Silva et al. |7J. Here we follow this annotation based algorithm. 

2 Persistent Cohomology Algorithm and Annotations 

In this section, we recall the annotation-based persistent cohomology algorithm of [8 . It maintains 
a cohomology basis under simplex insertions, where representative cocycles are maintained by the 
value they take on the simplices. We rephrase the description of this algorithm with coefficients 
in an arbitrary field F, and use classic field notations (F, +, •, — , /, 0, 1). 

Definition 1. Given a simplicial complex K,, let JC P denote the set of p-simplices in JC. An 
annotation for )C P is an assignement a p : K. p — > ¥ 9 of an ¥-vector a a = a p (<r) of same length g 
for each p- simplex a G JC P . We use a when there is no ambiguity in the dimension. We also have 
an induced annotation for any p-chain c = ^2 t fiOi given by linear extension: a c = J^ /j • a CTi . 
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Definition 2. An annotation a : JC P —} F 9 is valid if: 

1. g = rankH p ()C), and 

2. two p-cycles z\ and Zi have a Zl = a Z2 iff their homology classes [z{\ and [zq\ are identical. 

Proposition 1 Q8J). The following two statements are equivalent: 

1. An annotation a : KP — \ F 9 is valid 

2. The cochains {4>j}j=i-..g given by fa(cr) = a a [j] for all a € KP are cocycles whose cohomology 
classes {[4>jWj=i---g constitute a basis of H P (JC). 

A valid annotation is thus a way to represent a cohomology basis. The algorithm for comput- 
ing persistent cohomology consists in maintaining a valid annotation for each dimension when 
inserting all simplices in the order of the filtration. Since we process the filtration in a direction 
opposite to the cohomology sequence (as in Equation [2]) , we discover the death points of coho- 
mology classes earlier than their birth points. To avoid the confusion, we still say that a new 
cocycle (or its class) is born when we discover it for the first time and an existing cocycle (or its 
class) dies when we see it no more. 

We present the algorithm and refer to [5] for its validity. We insert simplices in the order 
of the filtration. Consider an elementary inclusion Ki <— » ICi U {<r}, with a a p-simplex. Assume 
that to every simplex r of any dimension in ICi is attached an annotation vector a T from a valid 
annotation a of /C». We describe how to obtain a valid annotation for ICi U {a} from that of ICi. 
We compute the annotation ag a for the boundary da in ICi and take actions as follows: 
Case 1: If ag a = 0, g <— g + 1 and the annotation vector of any p-simplex r 6 ICi is augmented 
with a entry so that a T = [/i, • • • , f g ] T becomes [/i, • • • ,f g , 0] T . We assign to the new simplex 
a the annotation vector a ff = [0, • • • , 0, 1] T . According to Proposition nj this is equivalent to 
creating a new cohomology class represented by far) = for r ^ a and 4>(<t) = 1. 
Case 2: If a,^ 7^ 0, we consider the non-zero element Cj of ag CT with maximal index j. We now look 
for annotations of those (p — l)-simplices r that have a non-zero element at index j and process 
them as follows. If the clement of index j of a r is / 7^ 0, we add —f/cj ■ aoa to a r . Note that, in the 
annotation matrix whose columns arc the annotation vectors, this implements simultaneously a 
series of elementary row operations, where each row fa receives fa <— fa — (ag a [i]/cj) x fa:. As a 
result, all the elements of index j in all columns are now and hence the entire row j becomes 0. 
We then remove the row j and set g ■<— g — 1. a is assigned a a = 0. According to Proposition [T] 
this is equivalent to removing the j th cocycle 4>j(t) = a T [j]. 

As with the original persistence algorithm, the pairing of simplices is derived from the creation 
and destruction of the cohomology basis elements. 

3 Data Structures and Implementation 

In this section we present our implementation of the annotation-based persistent cohomology 
algorithm. We separate the representation of the simplicial complex from the representation of 
the cohomology groups. 

3.1 Representation of the Simplicial Complex 

We represent the simplicial complex in a data structure SC equipped with the operation Compute- 
BOUNDARy(ct) that computes the boundary of a simplex a. We denote by Cq the complexity of 
this operation where p is the dimension of a . Additionally, the simplices are ordered according 
to the filtration. 

Two data structures to represent simplicial complexes are of particular interest here. The first 
one is the Hasse diagram, which is the graph whose nodes are in bijection with the simplices (of 
all dimensions) of the simplicial complex and an edge links two nodes representing two simplices 
t and a iff r C a. and the dimensions of r and a differ by 1. The second data structure is 
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Fig. 1. Compressed annotation matrix of a matrix with integer coefficients. 



the simplex tree introduced in [2J, which is a specific spanning tree of the Hasse diagram. For a 
simplicial complex K. of dimension k and a simplex a G K, of dimension p, the Hasse diagram has 
size 0(fc|/C|) and allows to compute Compute-BOUNDARY(ct) in time 0(p), whereas the simplex 
tree has size 0(|/C|) and allows to compute Compute-BOUNDARY(ct) in time 0(p 2 D m ), where 
D m is a small value related to the time needed to traverse the simplex tree. 

3.2 The Compressed Annotation Matrix 

For each dimension p, the p th cohomology group can be seen as a valid annotation for the p- 
simplices of the simplicial complex. Hence, an annotation a : KP — > F 9 can be represented as a 
g x \K. P \ matrix with elements in F, where each column is an annotation vector associated to a 
p-simplex. We describe how to represent this annotation matrix in a efficient way. 
Compressing the annotation matrix: In most applications, the annotation matrix is sparse 
and we store it as illustrated in Figure [l] A column is represented as the singly-linked list of its 
non-zero elements, where the list contains a pair (i, /) if the i th element of the column is / ^ 0. 
The pairs in the list are ordered according to row index i. All pairs (i, /) with same row index i 
are linked in a doubly- linked list. 

Removing duplicate columns: (see Figure 111 To avoid storing duplicate columns, we use 
two data structures. The first one, AV P , stores the annotation vectors and allows fast search, 
insertion and deletion. AV P can be implemented as a red-black tree or a hash table. We denote 
by C^y the complexity of an operation in AV P . The simpliccs of the same dimension that have 
the same annotation vector are now stored in a same set and the various (and disjoint) sets are 
stored in a union- find data structure denoted UF V . UJ- p is encoded as a forest where each tree 
contains the elements of a set, the root being the "representative" of the set. The trees of UJ- P 
are in bijection with the different annotation vectors stored in AV P and the root of each tree 
maintains a pointer to the corresponding annotation vector in AV P . Each node representing a 
p-simplex a in the simplicial complex SC stores a pointer to an element of the tree of lAJ :p which 
is associated to its annotation vector a CT . Finding the annotation vector of a consists in getting 
the element it points to in a tree of 1AT P and then finding the root of the tree which points to a a 
in AV P . We avail the following operations on UJ- p : 

• Create-SET: creates a new tree containing one element. 

• FlND-ROOT: finds the root of a tree, given an element in the tree. 

• Union-sets: merges two trees. 

The number of elements maintained in WF P is at most the number of simplices of dimension 
p, i.e. \IC P \. The operations FlND-ROOT and Union-sets on UT P can be computed in amortized 
time 0(a(|/C p Q), where a(n) is the very slowly growing inverse Ackcrmann function (constant 
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less than 4 in practice), and Create-SET is performed in constant time. We will refer to this 
data structure as the Compressed Annotation Matrix. 

Operations: The compressed annotation matrix described above supports the following opera- 
tions. Complexities are expressed for an annotation matrix with g rows and s distinct columns: 

• SuM-ANN(ai, 82): computes the sum of two annotation vectors ai and a2, and returns the 
lowest non-zero coefficient if it exists. The column elements are sorted by increasing row index, 
so the sum is performed in 0(g) time. 

• Search-ANN/Add-ANn/Remove-ANN (a): searches, adds or removes an annotation vector 
a from AV P in 0{C P AV ) time. 

• Create-COCYCLe() implements Case 1 of the algorithm described in section [2] It inserts 
a new column in AV P containing one element (i n0 w, 1), where i ncw is the index of the created 
cocycle. This is performed in time 0(C AV ). We also create a new disjoint set in UJ- V for the new 
column. This is done in Oil) time using Create-SET. 

• KiLL-COCYCLE(ag CT , Cj, j) implements Case 2 of the algorithm. It finds all columns with a 
non-zero element at index j and, for each such column A, it adds to A the column —f/cj ■ zg a if / 
is the non-zero element at index j in A. To find the columns with a non-zero element at index j, 
we use the row doubly-linked list at index j. We call Sum-ANN to compute the sums. The overall 
time needed for all columns is 0(gs) in the worst-case. Finally, we remove duplicate columns 
using operations on AV P (in 0(sC AV ) time in the worst-case) and call Union-sets on UJ :p if 
two sets of simplices, which had different annotation vectors before calling KlLL-COCYCLE, are 
assigned the same annotation vector. This is performed in at most 0(sa(\IC p \)) time. 

3.3 Computing Persistent Cohomology 

Given as input a filtered simplicial complex represented in a data structure SC, we compute the 
persistence diagram of the filtration. 

Implementation of the persistent cohomology algorithm: We insert the simplices in the 
filtration order and update the data structures during the successive insertions. The simplicial 
complex is stored in a simplicial complex data structure SC and we maintain, for each dimension 
p, a compressed annotation matrix, which is empty at the beginning of the computation. Let a 
be a p-simplex we insert. We compute da using Compute-BOUNDARY in SC (in 0(Cq) time), 
and find the annotation vectors of the boundary faces using p+1 calls to Find-ROOT in WJ-" P_1 
(in 0(pa(\K p ~ I)) time). We compute ag a by summing the annotation vectors of the faces in the 
boundary of a, using p+1 calls to SUM-ANN (in 0(pg) time). Depending on the value of ag a , we 
call either Create-COCYCLE (in 0(C AV ) time) or KlLL-COCYCLE (in 0(s-(g+C p A y +a(\K. p ~ 1 \))) 
time). The algorithm returns the persistence diagram. 
Complexity analysis: The complexity for inserting a of dimension p is: 

O (c p +p(a(\IC p - 1 \)+g)+C p AV + s( g + C p Av 1 +a(\IC p - 1 \))) 

Let k be the dimension of the simplicial complex and m its number of simplices. Let g m and s m 
be the maximal dimension of a cohomology group and the maximal number of distinct columns 
in the matrix, respectively, along the computation. The total cost for computing the persistent 
cohomology and the memory complexity for storing the compressed annotation matrices are 
respectively: 

O (m x [C g + k(a(m) + g m ) + $ m (g m + Cav + a ( m ))] ) an d O (m + kg m s m ) 

with Cg the complexity of Compute-BOUNDARY in SC, a(-) the inverse Ackermann function and 
Cav the complexity of an operation in AV. 
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Although s m is bounded by m and by 2 9m , it remains close to g m in practice, as illustrated 
in the experimental section. 

4 Filtration Strategies 

In this section, we show that we have some freedom in choosing the order in which we insert the 
simplices. We take advantage of this freedom to improve the performance of the algorithm. In 
section |4~T| we use the fact that the insertion of a simplex that creates a cocycle can be postponed 
until one of its cofaces is to be inserted. In section |4.2| we consider the important practical case 
where the filtration does not provide a strict order on the simplices, i.e. when some simplices 
have the same filtration value. In both cases, the optimization does not change the persistent 
cohomology of the simplicial complex. 

4.1 Lazy Evaluation 

We postpone the insertion of a simplex a that creates a cocycle until we consider one of its 
cofaces. Such a delayed insertion of a does not modify the behaviour of the algorithm since the 
annotation that is assigned to a does not affect any subsequent annotation updates until a coface 
of a appears. We give in Appendix |A.1| a formal proof that the lazy evaluation does not modify 
the persistent diagram. The lazy evaluation reduces the dimension g of the cohomology groups 
we maintain, as well as the number s of distinct annotation vectors. This consequently improves 
the time and space performance of the algorithm. 

We implement this lazy evaluation as follows. We mark each simplex whose insertion has 
been postponed (initially, no simplex is marked). As before, we call Lazy- EVALUATION, the lazy 
insertion procedure, on each simplex in the order of the filtration. Let a be a p-simplex on which 
we call Lazy-evaluation. If a is marked, Lazy-evaluation directly inserts it as a creator, 
without computing the annotation of its boundary, and unmarks a . If a is not marked, Lazy- 
evaluation computes the boundary da of a and is called recursively on each marked face of 
da. Lazy-evaluation then computes ag a and proceeds as follows. If ag a is non-zero (a kills a 
cocycle), we proceed as for the standard insertion of a and update the annotation. If ag CT is null 
(a creates a cocycle) , we simply mark a and postpone the insertion of a. The recursive calls to 
the marked subfaces guarantee that all subfaces of a have been inserted prior to the computation 
of ag a . Note that the lazy evaluation does not induce an additional cost since Lazy-evaluation 
is called at most twice on each simplex and we compute the boundaries only once. In section [5j 
we give experimental evidence that this lazy evaluation is effective at decreasing the maximal 
dimension of the cohomology groups we maintain. 

4.2 Ordering Iso-simplices 

Many simplices, called iso-simplices, may have the same filtration value. This situation is common 
when the filtration is induced by a geometric scaling parameter. Assume that we want to compute 
the cohomology groups H p (]Ci+i) from H p (JCi) where /C; C JCi+± and all simplices in /C,+i \ /Q 
have the same filtration value. Depending on the insertion order of the simplices of /Cj+i \ /C,;, 
the dimension of the cohomology groups to be maintained along the computation may vary a lot 
as well as the computing time, potentially leading to a computational bottleneck. We propose a 
heuristic to order iso-simplices that appears to be efficient in practice (section ^1 . 

Intuitively, we want to avoid the creation of many "holes" of dimension p and want to fill 
them up as soon as possible with simplices of dimension p+1. For example, in Figure [2j we want 
to avoid inserting all edges first, which will create two holes that will be filled when inserting 
the triangles. To do so, we look for the maximal faces to be inserted and recursively insert their 
subfaces. We conduct the recursion so as to minimize the maximum number of holes. In addition, 
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Fig. 2. Inclusion /Ci C Ki+i. Left: upward traversal (in green) from simplex {c}. The ordering of the 
maximal cofaces appears in blue. Right: downward traversal (in orange) from simplex {abc}. The ordering 
of the subfaces appears in blue. 



to avoid the creation of holes due to maximal simplices that are incident, maximal simpliccs 
sharing subfaces are inserted next to each other. We can describe the reordering algorithm in 
terms of a graph traversal. The graph considered is the graph of the Hasse diagram of /Q+i \ JCi, 
defined in section 3.1 (see Figure pi. 

Let o\ ■ ■ ■ o~i be the iso-simplices of /Cj+i \ JCi, sorted so as to respect the inclusion order. We 
attach to each simplex two flags, a flag F up and a flag Fd OW n, set to originally. When inserting a 
simplex <7j, we proceed as follows. We traverse the Hasse diagram upward in a depth-first fashion 
and list the inclusion-maximal cofaces of Oj in /C^+i \ /Q. The flags F up of all nodes traversed 
are set to 1 and the maximal cofaces are ordered according to the traversal. From each maximal 
coface in this order, we then traverse the graph downward and order the subfaces in a depth-first 
fashion: this last order will be the order of insertion of the simplices. The flags -Fdown of all 
traversed nodes are set to 1. We stop the upward (resp. downward) traversal when encountering 
a node whose flag F up (resp. -Fdown) is set to 1. We do not insert already inserted simplices cither. 

By proceeding as above on all simplices of the sequence a\ ■ ■ ■ erg, we define a new order which 
respects the inclusion order between the simplices. Indeed, as the downward traversal starts from 
a maximal face and is depth first, a face is always inserted after its subfaces. Every edge in the 
graph is traversed twice, once when going upward and the other when going downward. Indeed, 
during the upward traversal, at each node N associated to a simplex ajq, we visit only the edges 
between N and the nodes associated to the cofaces of a^ and, during the downward traversal, 
we visit only the edges between N and the nodes associated to the subfaces of ctat. If /Ci+i \ JC L 
contains (. simplices, the reordering takes in total 0(£ x (Cq +C co q)) time, where Cq (resp. C co q) 
refers to the complexity of computing the codimension 1 subfaces (resp. cofaces) of a simplex 
in the simplicial complex data structure SC. The reordering of the filtration can either be done 
as a preprocessing step if the whole filtration is known, or on-the-fly as only the neighboring 
simplices of a simplex need to be known at a time. It should also be observed that the reordering 
of iso-simplices is compatible with the lazy evaluation. The reordering of a set of iso-simplices 
respects the inclusion order of the simplices and the filtration, and therefore does not change 
the persistence diagram of the filtered simplicial complex. This is a direct consequence of the 
stability theorem of persistence diagrams [5_ (details in Appendix A.2|. However, it may change 
the pairing of simplices. 



5 Experiments 

In this section we report on the experimental performance of our implementation. Given the 
filtration of a simplicial complex as input, we measure the time taken by our implementation 
to compute the persistent cohomology of the filtration, as well as various statistics. We com- 
pare the timings with state-of-the-art software for computing persistent homology and cohomol- 
ogy. Specifically, we compare our implementation with the Dionysus library (www.mrzv.org/ 
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Fig. 3. Data, timings and statistics 



software/dionysus/) which provides implementation for persistent homology |10I13| and per- 
sistent cohomology [7_ (denoted DioCoH) with field coefficients in Z p , for any prime p. We also 
compare our implementation with the PHAT library (www.phat .googlecode . com) which provides 
an implementation of the optimized algorithm for persistent homology |3llj . with coefficients 
in Z2 only. DioCoH and PHAT have been reported to be the most efficient implementation in 
practice |6llj . Our implementation is in C++. All timings are measured on a Linux machine with 
3.00 GHz processor and 32 GB RAM. They are all averaged over 10 independent runs. The 
symbols T m means that the computation lasted more than 12 hours, and M M means that the 
computation ran out of memory. 

We use a variety of both real and synthetic datasets: Cy8 is a set of points in M 24 , sam- 
pled from the space of conformations of the cyclo-octane molecule, which is the union of two 
intersecting surfaces; S4 is a set of points sampled from the unit 4-sphere in M 5 ; Bro is a set 
of 5 x 5 high-contrast patches derived from natural images, interpreted as vectors in R 25 , from 
the Brown database; Kl is a set of points sampled from the surface of the figure eight Klein 
Bottle embedded in K 5 ; Bud is a set of points sampled from the surface of the Happy Buddha 
(http: //graphics . Stanford. edu/data/3Dscanrep/) in K 3 ; and Nep is a set of points sampled 
from the surface of the Neptune statue (http://shapes.aimatshape.net/). Datasets are listed 
in Figure |3] with details on the sets of points V, their size \V\, the ambient dimension D, the 
intrinsic dimension d of the object the sample points belong to (if known), the threshold /O m ax, 
the dimension k of the simplicial complexes and the size \K.\ of the simplicial complexes. We 
use three families of simplicial complexes [9 which are of particular interest in topological data 
analysis: the Rips complexes (denoted Rips), the relaxed witness complexes (denoted Wit) and 
the a-shapes (denoted aSh). Simplicial complexes are constructed up to embedding dimension. 
Time Performance: As Dionysus and PHAT encode explicitely the boundaries of the simplices, 
we use a Hasse diagram for implementing SC and having the same time complexity for accessing 
the boundaries of simplices. As illustrated in Figure k3] our implementation (noted CAM) out- 
performs DioCoH and PHAT on all examples. The persistent cohomology algorithm of Dionysus 
is always several times slower than our implementation. Moreover, DioCoH is very sensitive to 
non-orientability (as in the case of Cy8 and Bro) where changing the field (here from Z2 to 
Zn) changes the structure of the cohomology groups and leads to more complex calculations. 
On the other hand, our implementation CAM shows almost identical performance for Z2 and Zn 
coefficients on all examples. 

The persistent homology algorithm of PHAT shows good performance in the case of small sim- 
plicial complexes (< 21M simplices): even if CAM is always faster, the timings are sometimes close. 
However, PHAT provides only computation of persistent homology with Z2 coefficients, where the 
computation is sped up by the fact that the field operations +, -, — ,/ become straighforward, 
and the sum of two sparse vectors can be computed as a symmetric difference. Moreover, PHAT 
did not scale to bigger simplicial complexes (> 50M simplices), where the computational cost 
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Fig. 4. Statistics on the effect of the optimizations. 



increases dramatically in terms of time and memory (for example PHAT runs out of RAM after 9H 
of computation for S4). On all examples, CAM did not take more than half a minute to compute 
persistence. 

Statistics and Optimization: Figure [4] presents the effects on the computation of both com- 
pression of the matrix (from section k3| and filtration strategies (from section HI on the example 
Nep, which is of particular interest because no other method can compute its persistent homol- 
ogy: interestingly, we have noticed that the dimensions of the cohomology groups increase up 
to several thousands, for a simplicial complex with a simple topology. The left table presents 
the number of non-zero elements of the compressed annotation matrix (|M|) and the number 
of field operations in F (jfFop.) with and without the compression of the matrix (removal of 
duplicate columns). We note a reduction factor of 4.5 for the size of the matrix, and we proceed 
to 46 times less field operations with the compression. Considering Nep is 57 million simplices, 
this makes less than 1.5 field operations per simplex in average, for our implementation. The 
middle table presents the effect of the filtration strategies (section EJ on the dimensions of the 
cohomology groups and the number of distinct annotation vectors, which are two key values in 
the complexity analysis. Here G m stands for the sum of the dimensions of the cohomology groups 
and S m stands for the sum of distinct annotation vectors (for all dimension of simplices), both 
maximized over the computation. The filtration strategies reduce the dimension of the cohomol- 
ogy groups by a factor 5. Note also that S m remains close to G m . Finally, the right table presents 
the repartition of computational time between maintaining the compressed annotation matrix 
(under removal of duplicate vectors, etc, noted Mds), computing the annotation vector ago- and 
modifying the values of the elements in the compressed annotation matrix (noted M op ), when 
computing persistent cohomology with Zn and Z393 coefficients. The computational complexity 
of field operations (F, +,-,—, /,0, 1), and in particular the inversion /, varies depending of the 
field we use. As the inversion is needed only when modifying the matrix and M op represents 
a small part of the computation, our implementation is quite insensitive to the field we use. 
Specifically, when the time spent for modifying the matrix elements is multiplied by a factor 
of 3.3 when using Z393 instead of Zn, the total cost for computing persistent cohomology only 
increases by 23%. 

On all our experiments, the size of the compressed annotation matrix is negligible compare to 
the size of the simplicial complex. Consequently, combined with the simplex tree data structure [2 
for representing the simplicial complex, we have been able to compute the persistent cohomology 
of simplicial complexes of several hundred million simplices in high dimension. 



Acknowledgements 



This research is partially supported by the 7th Framework Programme for Research of the Eu- 
ropean Commission, under FET-Open grant number 255827 (CGL Computational Geometry 
Learning). This research is also partially supported by NSF (National Science Foundation, USA) 
grants CCF-1048983 and CCF-1116258 



RR n° 8195 



12 Jean-Daniel Boissonnat , Tamal K. Dey , Clement Maria 



References 

1. Ulrich Bauer, Michael Kerber, and Jan Reininghaus. Clear and compress: Computing persistent 
homology in chunks. 2013. 

2. Jean-Daniel Boissonnat and Clement Maria. The simplex tree: An efficient data structure for general 
simplicial complexes. In ESA, pages 731-742, 2012. 

3. Chao Chen and Michael Kerber. Persistent homology computation with a twist. In Proceedings 27th 
European Workshop on Computational Geometry, 2011. 

4. Chao Chen and Michael Kerber. An output-sensitive algorithm for persistent homology. Comput. 
Geom., 46(4):435-447, 2013. 

5. David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Stability of persistence diagrams. 
Discrete & Computational Geometry, 37(1):103-120, 2007. 

6. V. de Silva, D. Morozov, and M. Vejdemo-Johansson. Dualities in persistent (co)homology. Inverse 
Problems, 27:124003, 2011. 

7. V. de Silva, D. Morozov, and M. Vejdemo-Johansson. Persistent cohomology and circular coordinates. 
Discrete Comput. Geom., 45:737-759, 2011. 

8. Tamal K. Dey, Fengtao Fan, and Yusu Wang. Computing topological persistence for simplicial maps. 
CoRR, abs/1208.5018, 2012. 

9. Herbert Edelsbrunner and John Harer. Computational Topology - an Introduction. American Math- 
ematical Society, 2010. 

10. Herbert Edelsbrunner, David Letscher, and Afra Zomorodian. Topological persistence and simplifi- 
cation. Discrete & Computational Geometry, 28(4):511-533, 2002. 

11. Nikola Milosavljevic, Dmitriy Morozov, and Primoz Skraba. Zigzag persistent homology in matrix 
multiplication time. In Symposium on Computational Geometry, pages 216-225, 2011. 

12. Dmitriy Morozov. Persistence algorithm takes cubic time in worst case. BioGeometry News, Dept. 
Comput. Sci., Duke Univ, 2005. 

13. Afra Zomorodian and Gunnar Carlsson. Computing persistent homology. Discrete & Computational 
Geometry, 33(2):249-274, 2005. 



Inria 



The Compressed Annotation Matrix for Persistent Cohomology 



13 



A Correctness of the Filtration Strategies 
A.l Correctness of the Lazy Evaluation 

We prove that the persistence diagram we compute with the lazy algorithm is the same as the 
persistence diagram we compute with the standard persistence algorithm. We first prove the 
lemma: 

Lemma 1. Given a filtration F — F\ ■ a ■ r • F2, the pairing of simplices remains unchanged in 
the filtration F = Fi ■ r • a ■ F 2 if a is a creator in F and a (£. t . 

Proof. Let B\ and B\ be the sets of cocycles, maintained with the annotations in the algorithm, 
representing the classes forming a basis of the cohomology group of dimension p after considering, 
respectively, the prefix F± and the prefix F\ ■ a ■ t of the filtration ordering F. Define similarly 
B\ and B\ for, respectively, the prefix F\ and the prefix F\ ■ r • <r of the filtration ordering F . We 
first prove that a and r are paired the same way in F and F. 

By hypothesis, a is a creator in F thus ag a — after processing the prefix F\ of the filtration 
F. We prove that ZQ a = after processing F\-t in F. Let da = {si, ■ ■ ■ , s\ a \} be the codimension 
1 subfaces of a. If r is a creator or the dimension of r is different from the one of a, the insertion 
of r does not modify the annotation vectors of the SjS and ag a is not changed. If r is a killer of 
same dimension as cr, and kills the cocycle of index i, the insertion of r will modify the value of 
the annotation vector of a simplex Sj from the boundary of a such that: a s . <— a s . — a s . [i]/ci • ag T , 
with Ci = ag T [i]. 



If we compute ag a after the insertion of r we get: 


a ^= E (-tf 






j=l-\<7\ L 


r -| 


= 


E (-tfa.i 


1 

d 


E (-i^w 




[j=i-\*\ 






Li=i- 


-kl 



a r = 



because a CT is before the insertion of r. Consequently, a is also a creator in the filtration F. In 
both nitrations cr creates a cocycle ^ CT with time of appearence p a . 

As a <£. t and a is a creator, the annotation ag T of the boundary of r remains unchanged whether 
cr is inserted before or after r. Consequently, r is paired the same way in F or F, and the cocycle 
it creates/kills is born/dies at the same time p T . 

We finally prove that, for any p, B\ — B\ and B% — B\. The first equality B\ — B\ is 
straighforward. The second equality B\ — B% is deduced from the fact that a and r admits the 
same pairing in both filtrations, and give the same times for births and deaths of cocycles. Thus, 
the pairing of the simplices in F\ and F2 is the same in F and F. 

We deduce: 

Proposition 2. The lazy algorithm computes the same persistence diagram as the standard al- 
gorithm for computing persistent homology. 

Proof. The shuffle of the filtration induced by the lazy evaluation can be decomposed into suc- 
cessive elementary transpositions of the form F = F\ ■ a ■ t ■ F-i —¥ F = F\ ■ r ■ a ■ F2, with a 
creator in F and cr ^ r. As such a transposition does not modify the pairing nor the time of 
birth/death of cocycle classes (lemma [T|, the output persistence diagram is the same. 
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A. 2 Correctness of the Iso-simplices Ordering 

The fact that the reordering within a set of simplices with a same filtration value does not change 
the persistence diagram of the filtered simplicial complex, while changing the pairing of simplices, 
is a direct consequence of the stability theorem of persistence diagrams: 

Theorem 1. f5jj Let X be a triangulable space with continuous tame functions /, g : X — > R. 
Then the persistence diagrams satisfy dg(D(f), D(g)) < \\f — g\\oo, where dg is the bottleneck 
distance, D(h) the persistence diagram of function h and \\ ■ W^ the L^ distance. 

As our reordering still respects the inclusion order of simplices it defines a valid filtration on the 
simplices. As the filtration values are unchanged, the persistence diagram remains the same. 
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